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I Abstract 

bJO Given an instance of an NP-hard problem, how hard is it to compute a (possibly infeasible) 

solution x, such that x is guaranteed to agree with some feasible solution x* in at least half its 
bits? Such questions about "structural" approximability are motivated by applications such as 
Computer Tomography, in which one wants to reconstruct as much of the full structure of the 
solution as possible. In this spirit, Feige et al. [T] (following Kumar and Sivakumar U) show 
that, for some e > 0, given an instance "J of 3-SAT, it is NP-hard to compute an assignment 
x that agrees with any satisfying assignment x* of if? in at least n/2 + n 1 ~ e of the n variables. 
They show similar negative results for other natural NP-complete problems. Guruswami and 
c/3 . Rudra [5] strengthen their bounds to n/2 + rt 2 / 3 + e (for all fixed e > 0). 

The main result in this paper is as follows. For the "universal" NP-complete language U 7 
for any positive e, it is NP-hard to compute an x that agrees with a witness x* in at least 
n/2 — ey/n \ogn bits. In contrast to previous results, this is less than half the bits. This result 
extends to randomized algorithms, for which it is essentially tight. 
£^ " We also give improved negative results for several natural NP-complete problems, as well as 

. the first positive (algorithmic) results for Vertex Cover, Independent Set, Clique, and U. 
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1 Introduction 
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Consider the discrete tomography problem. An instance is specified by numerous two-dimensional 
x-rays, formed by x-raying a three-dimensional object along various directions. A solution is a 
discrete function describing the internal structure of the object (specifying the density of matter 
at each point in the object). Given sufficiently many x-ray images, the internal structure may 
be determined uniquely, yet computing it exactly is (in general) NP-complete. What form of 
approximate solution is appropriate in this context? One kind of approximate solution, typical in 
the study of approximation algorithms, would be an object that, if subject to x-rays from the same 
perspectives, would yield approximately the same x-ray images (in other words, a solution to the 
underlying constrained optimization problem that meets the constraints approximately) . But such 
a solution can have a completely different internal structure than any true solution. In this context, 
where the goal is to discover as much about the internal structure of the object as possible, this 
form of approximation is unsuitable. 

The Hamming distance to a true solution would be a more appropriate metric for approximation. 
For example, if the discrete tomography problem models an object with just two possible local 
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densities {0, 1}, the problem can be modeled as a system of linear equations over a vector x of 
variables taking values in {0, 1}. A solution is specified by an assignment of values to the variables. 
A "good" approximate solution x would be one that agrees with some true solution x* in many 
variables - that is, an x with small Hamming distance to the set of true solutions. This paper is 
about the computational complexity of computing such an x, for various NP-complete problems. 

Definitions. Here we follow [HH], with minor variations. Every language L in NP is characterized 
by a witness relation TZl, such that L = {x : (x, w) 6 TZl for some w} and (x, w) £ TZl is decidable 
in time polynomial in We call w such that (x,w) G TZl a witness for x £ L. 

An algorithm A achieves Hamming distance h(n) (with respect to L and TZl) if, on any input 
(x, n) where x £ L has some witness of size n, the algorithm outputs a string A(x) of size n having 
Hamming distance at most h(n) to some such witness. If A is a randomized algorithm, we say that 
A achieves Hamming distance f(n) with probability p(n) if the algorithm outputs such a string with 
probability at least p(n). 

Note that small Hamming distance is synonymous with large agreement: x achieves Hamming 
distance h{n) from x* iff it agrees with x* on at least n — h{n) bits. 

We use U and TZu to denote a universal NP-complete language and its witness relation, respec- 
tively. Specifically, TZu contains encoded tuples (M,x,b,w) such that (1) M is (the encoding of) a 
deterministic Turing machine, x, b, and w are binary strings, and (2) M accepts input {x, w) within 
\b\ 2 steps (b is "padding"). Then U contains encoded tuples (M,x,b) such that (M,x,b,w) S TZu 
for some witness w. 

Here are some basic facts about IA. 

Observation 1 U is NP-complete. 

The proof is standard, and we omit it. 

Also, IA is as hard to Hamming-approximate as any other NP language: 

Observation 2 Suppose, forlA andTZu, that there is a polynomial-time algorithm Au that achieves 
Hamming distance h(n) with probability p(n). 

Let L be any language in NP, and let TZl be any polynomial-time witness relation for L. 

Then (for L and TZl) there is a polynomial-time algorithm Al that achieves Hamming distance 
h{n) with probability p(n). 

The proof, which is straightforward, is in the Appendix. 

Previous results. Hardness of Hamming approximation was previously studied by Feige et al. PQ 
and Guruswami and Rudra [2]. Feige et al. show, for many natural NP-complete problems, that 
achieving Hamming distance much less than n/2 (a natural threshold for binary encodings) is NP- 
hard. Specifically, Feige et al. show (for some 5 < 1, for many standard NP-complete problems, 
including SAT)) that it is NP-hard to achieve Hamming distance less than n/2 — n s . They extend 
the result to randomized algorithms: no randomized polynomial-time algorithm achieves Hamming 
distance less than n/2 — re" 5 with probability at least l/n c (for any fixed c) unless RP=NP. A 
motivating application was to give evidence that a SAT algorithm by Schoning could not be sped up 
in a particular way, but Feige et al. were also motivated by related work by Kumar and Ravikumar 
[4] concerning error-correcting codes. 
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Guruswami and Rudra [2] use error-correcting codes to strengthen the hardness results of Feige 
et al. Specifically, they show that, for every NP problem, and every e > 0, there is a formulation of 
the problem for which it is NP-hard to achieve Hamming distance less than n/2 — n 2 / 3+<E , reducing 
the exponent in the subtracted term from just less than 1 down to about 2/3. 

New results. We start with some fairly straightforward observations. 

• The lower bounds of Feige et al., and of Guruswami and Rudra, can be strengthened, reducing 
the exponent in the subtracted term to any fixed e > 0, for most of the NP-complete prob- 
lems studied by Feige et al.: no deterministic polynomial-time algorithm achieves Hamming 
distance less than n/2 — n e unless P=NP. 

• Likewise, for any positive e and c, no randomized polynomial-time algorithm achieves Ham- 
ming distance less than n/2 — n e with probability 1/2 + l/n c unless RP=NP. 

A main point of the above observations is that they don't rely on error-correcting codes; trivial 
amplification arguments (closely related to the arguments by Feige et al.) suffice to prove them. 

For Vertex Cover, Independent Set, and Clique, we observe the following positive results: 

• For unweighted Vertex Cover, unweighted Independent Set, and unweighted Clique, there are 
polynomial-time algorithms that achieve Hamming distance n/2. 

These observations follow easily from a known combinatorial property of Vertex Cover. These are 
the first non-trivial positive results that we are aware of. 

The threshold of n/2 is a natural reference point for Hamming approximation, given binary 
encodings. For example, a random string x achieves Hamming distance n/2 from any x* in expec- 
tation (and close to n/2 with high probability). Also, for any string x, it is guaranteed that either 
x or its complement will have Hamming distance at most n/2 from any x*. Given these intuitions, 
and the hardness results and the algorithmic results above, one might expect a-priori that, for all 
NP problems, an x with Hamming distance n/2 or less from some solution x* should be computable 
in polynomial time. Our main contribution, next, is to prove that this intuition is false: 

• For U, for any e > 0, no polynomial time algorithm achieves Hamming distance less than 
n/2 + \/en In n unless P=NP. (That is, one can't even get n/2 — Ven In n of the bits right.) 

• Further, no randomized polynomial-time algorithm achieves Hamming distance less than 
n/2 + Ven In n with probability at least 1 — 0((n 2<E+1 v / e Inn) -1 ) unless RP=NP. 

Thus, for U, in contrast to Vertex Cover, Independent Set, and Clique, one cannot achieve Hamming 
distance n/2 in polynomial time (unless P=NP). 

The intuition behind the proof of the main result is as follows. Given a particular x, of the 2 n 
potential witnesses (potential assignments to x*), the fraction that lie within Hamming distance 
n/2 + Ven In n of x is 1 — n~ c (for some constant c depending on e). Thus, assuming there is 
a polynomial-time algorithm to find such an x, with a single call to that algorithm, the number 
of potential witnesses can be reduced by a factor of 1 — n~ c . By iterating, say, n c+1 times (and 
carefully recoding the set of potential witnesses each time), the number of potential witnesses can 
be reduced from 2 n to 2 n (l — n~ c ) nC+1 , which is less than exp(n — n~ c n c+1 ) = 0(1). Then, each of 
the 0(1) remaining potential witnesses can be tested using the polynomial-time verifier for U. 
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For comparison, here are some complementary (and easy) positive results. For any NP language, 
a trivial deterministic polynomial-time algorithm achieves Hamming distance at most n — c (for 
any fixed c > 0). (This is a weak upper bound, but it is the best possible for any so-called black-box 
algorithm.) 

Likewise, the naive randomized algorithm (guess a random string) achieves Hamming distance 
at most n/2 + yen In n with probability 1 — 0((n 2e \/e Inn) -1 ). This shows that the randomized 
hardness result for IA is essentially tight. 

2 Improving the hardness results of Feige et al. 

This section describes how to improve many of the hardness results of Feige et al., to show, for 
several of the NP-complete problems they consider, that (for any e > 0) achieving Hamming distance 
at most n/2 — n t in polynomial time is impossible unless P=NP. The proofs are elementary padding 
arguments, similar in spirit to the arguments of Feige et al. 
We will use the following standard observation: 

Observation 3 (a) If the following problem has a polynomial-time algorithm, then P=NP: Given 
a 3-SAT formula, find a feasible value for the first variable in the formula, if one exists. 

(b) If there is an randomized polynomial-time algorithm that solves any n-variable instance of 
the above problem with probability 1/2 + l/n c for any fixed c > 0, then RP=NP. 

If the formula is satisfiable, a feasible value for the variable is one that is consistent with some 
satisfying assignment. If the formula is not satisfiable, any value can be found. 

Part (a) holds by standard arguments. To see why part (b) is true, note that the randomized 
algorithm could be used to find a satisfying assignment of a given formula with high probability: 
to determine the likely value of the first variable, run the randomized algorithm, say, n c+2 times, 
then take the majority value — this standard amplification trick boosts the probability of finding a 
feasible value to at least 1 — 1/n 2 . Then substitute the likely feasible value for the variable, simplify 
the formula, then recurse. This would find a full satisfying assignment with probability at least 
1 — 0(l/n) in polynomial time, showing that RP=NP.) 

We start by showing the hardness of Hamming-approximation for 3-SAT: 

Observation 4 Suppose that, for 3-SAT (with the natural witness relation), there exists e > 
such that some polynomial-time algorithm A achieves Hamming distance n/2 — n £ . Then P=NP. 

Proof: The proof is an elementary amplification argument. 

Assume the algorithm A in the statement of Observation [4] exists. Given any 3-SAT formula 
ty, we compute, in polynomial time, a feasible value for the first variable V as follows. 

To add \n^ e ] copies of V. Specifically, add new clauses (V 1 = V) A (V 2 = V) A • • • A (V 1 ^ = 
V) (where a = b is shorthand for (a V b) A (a V b), and V 1 , V 2 , . . . , V 1 ^ are new variables). This 
gives a formula ^f' with n' = n + n 1//<E variables, essentially preserving any satisfying assignments, 
but forcing the n 1//<E added variables to take the same value as V in any satisfying assignment. 

Run A on \&' 3 and let x' be the returned value. If is satisfiable, then x' achieves Hamming 
distance at most n'/2 — (n') e to an assignment x* satisfying VP'. That is, x' agrees with x* on 
at least n'/2 + (n') £ > n 1 / e /2 + n variables. To do so, even if x agrees with x* on all n of the 
original variables, it would still have to agree with x* on at least half (n 1 ^) of the duplicates of 
V. Thus, the majority value of the duplicate variables in x' (true or false, whichever x' assigns to 
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more duplicates) must be the value that x* assigns to V . This value must also be a feasible value 
for V in ^ (if one exists) . 

Thus, if A exists, then one can compute a feasible value for V in ^ (if one exists) in polynomial 
time. By Observation O then, P=NP. rj 



Observation 5 Suppose that, for SAT (with the natural witness relation), there exists e > 
such that some randomized polynomial-time algorithm A achieves Hamming distance n/2 — n e with 
probability 1/2 + l/n c for any fixed c > 0. Then RP=NP. 

Proof: Assume the algorithm A exists, and use it just did the preceding proof. Given the formula 
<& with n variables, call A on the formula <E>' with 0(n') variables, where n' = re 1 / 6 (n being the 
number of variables in $). By the properties of A assumed in the observation, the probability that 
the call succeeds in finding a feasible value for V (if one exists) is 1/2 + l/(n') c = 1/2 + l/n c / e . 
Thus, by Observation [3] (b) , RP would equal NP. rj 

Next we sketch how the same idea applies to other problems. 

Observation 6 Suppose that, for Vertex Cover, Independent Set, or Clique, there exists e > 
such that some polynomial-time algorithm A achieves Hamming distance n/2 — n e . Then P=NP. 

Proof: We sketch the proof for Vertex Cover. The rest follow via the standard reductions from 
Vertex Cover to Independent Set and from Independent Set to Clique, as these reductions preserve 
Hamming approximation. 

Suppose such an algorithm A exists. Given a graph G = (V, E) with n vertices, a non-isolated 
vertex w £ V, and the minimum size, k, of any vertex cover in G, we will use A to determine in 
polynomial time either (i) that G has a size-k vertex cover containing w, or (ii) that G has a size-fc 
vertex cover not containing w. By standard arguments, if this can be done in polynomial time, 
then P=NP. 

Determine (i) or (ii) as follows. Construct graph G' from G by adding a copy w' of w (with 
edges to all neighbors of w), and a path P (of new vertices) connecting w to w', so that \P\ (the 
number of edges in P) is even and roughly equals n 1//f . Let n 1 = n + \P\ be the number of vertices 
in G', and let k! = k + \P\/2. 

Denote the successive vertices on path P as w = Vq,vi,V2, ■ ■ ■ >V|p| = w'. Let Pq contain the 
"even" vertices V2, ■ ■ ■ , Vipi = w' (not including w) Let Pi contain the "odd" vertices v\, v%, . . . , Uipi_i. 

Run A on the instance (G f , k') and let x' be the output. Define the Hamming distance of x' 
to Pi (i 6 {0, 1}) to be the number of vertices v £ P such that (x v = 1) / (» 6 Pi). Return "(i) 
G has a size-/c vertex cover containing w" if the Hamming distance from x' to Po is l ess than the 
Hamming distance from x' to Pi. Otherwise, return "(ii) G has a size-k vertex cover not containing 
w. n 

To finish the proof we sketch why this procedure is correct. Let C' be any minimum-size vertex 
cover of G'. By standard arguments, C' has size k' and one of two cases holds: 

Case (1) C' = C U Pq where C is a size-A; vertex cover of G and w £ C, or 

Case (2) C' = C U Pi where C is a size-/c vertex cover of G and w $ C. 



5 



By assumption, x' has Hamming distance at most n'/2 — (n 1 ) 1 ^ to the witness x* for some such 
C 1 . That is, x' agrees with x* on at least n'/2 + (re') 1 / 6 > n 1 / e /2 + n vertices. Thus, focusing just 
on vertices in P, x' agrees with x* on more than half of the vertices in P. 

If Case (1) above occurs (for the C that x* represents), then the Hamming distance from x' 
to Pq must be less than |P|/2, so the Hamming distance from x' to Pi must be more than |P|/2, 
so the algorithm returns "(i) G has a size-A; vertex cover containing 10". This is correct, given that 
Case (1) occurs. 

By a similar argument, if Case (2) occurs, then the algorithm returns "(ii) G has a size-/c vertex 
cover not containing w" , which is correct in this case. q 

Just as Observation H] extended to randomized algorithms, the above argument extends to prove 
the following observation: 

Observation 7 Suppose that, for Vertex Cover, Independent Set, or Clique, there exists positive e 
and c such that some randomized polynomial-time algorithm A achieves Hamming distance n/2 — n € 
with probability 1/2 + l/n c . Then RP=NP. 

Observation 8 Suppose that, for Directed Hamiltonian Cycle (with the witness being a subset of 
the edges that forms the Hamiltonian cycle) there exists e > such that some polynomial-time 
algorithm A achieves Hamming distance n/2 — n e . Then P=NP. 

Proof: By definition, any polynomial-time reduction from 3-SAT to Directed Hamiltonian Cycle 
works as follows. Given a 3-SAT formula <£, the reduction produces, in polynomial time, a directed 
graph G = (V, E) such that G has a Hamiltonian cycle if and only if <3? is satisfiable; further, 
given any Hamiltonian cycle C in G, the reduction describes how to compute an assignment A{C) 
satisfying G = (V, E) in polynomial time. 

There exist well known reductions (e.g., see [6]) such that have G, and A(-) have the following 
further properties. For any variable V in there are a pair of edges (it, v) and (v,u) such that, 
for any Hamiltonian cycle C, either C contains (u, v ) and A{C) assigns V = true, or C contains 
(u, it) and A(C) assigns V = false. 

Assume the algorithm A from the observation exists. We describe below how to modify any 
reduction with the above properties so as to solve the following problem in polynomial time: given 
a 3-SAT formula determine a feasible value for the first variable in <3? (if any exists. As in the 
proof of Observation U this is enough to prove P=NP. 

Given apply the reduction with the above properties to compute the graph G = (V, E). Then, 
for the first variable, say, Z in $, let (u, v) and (v, u) be the two edges in G for Z as described above. 
Replace the edges (u, v) and (v, u), respectively, with paths Po = (u = wo, w\,W2, ■ ■ ■ , Wk,Wk+i = v) 
and P\ = (v , Wk, Wk-i, ■ ■ ■ , w\, u), where w\, u>2, ■ ■ ■ , Wk are new vertices and k = n 1 / e /2, where 
n = \E\ is the number of edges in G. Say that each edge (u>i, Wi+i) is a duplicate of (it, v), and that 
each edge (wi + \,Wi) is a duplicate of (v, u). Let P = P$ U Pi be the set of all duplicate edges. Call 
the resulting graph G' , and let n' = n + n 1 ^ be the number of edges in G 1 . 

Run the algorithm A on G' , and let x' be the output. Define the Hamming distance from x' to 
Pi (for i = 0, 1) to be the number of edges e in P such that (x' e = 1) / (e £ Pi). If the Hamming 
distance from x' to Po is less than the Hamming distance from x' to Pi, then return "(i) The value 
True is feasible for the variable Z" and otherwise return "(ii) The value False is feasible for the 
variable Z" . 
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To finish, we prove that this procedure determines a feasible value for Z, if there is one. 

Assume that there is a feasible value for Z (that is, that ^ is satisfiable) . The output x' of A 
then has Hamming distance at most n'/2 — (n') e to some witness x* for some Hamiltonian cycle C" 
in G' (where x* = 1 iff e 6 C). That is, x' agrees with some C" on at least n'/2 + (n') e > n l / e /2 + n 
edges. Thus, x' agrees with C' on strictly more than n 1 / e /2 (half) of the edges in P. By the 
properties of the reduction, one of two cases holds: 

Case 1. C n P = Pq, and True is a feasible value for Z. In this case, the Hamming distance from 
x' to Pq must be less than |-P|/2, so the Hamming distance from x' to Pi must be more than 
|P|/2, so the procedure returns "(i) The value True is feasible for the variable Z", which is 
correct. 

Case 2. C n P = P%, and False is a feasible value for Z. By similar reasoning, the procedure is 
correct in this case as well. 

□ 

As do the previous observations, Observation [8] extends to randomized algorithms: 

Observation 9 Suppose that, for Directed Hamiltonian Cycle, there exist positive e and c such that 
some randomized polynomial-time algorithm A achieves Hamming distance n/2 — n e with probability 
1/2 + l/n c . Then RP=NP. 

3 Algorithms for Vertex Cover and related problems 

This section presents the positive results for Vertex Cover and related problems. 

Observation 10 There are polynomial-time algorithms achieving Hamming distance n/2 for the 
(unweighted) Vertex Cover, Independent Set, and Clique problems. 

The algorithms for Independent Set and Clique work by standard reductions to Vertex Cover. 
The algorithm for Vertex Cover is based on a classic result of Nemhauser and Trotter. 

Theorem 1 ([5, 3j) Fix any instance I of Weighted Vertex Cover. Let y be any (minimum cost) 
basic feasible solution to the linear program relaxation of the standard integer linear program for I. 
Then, for each vertex v, the variable y v has value in {0, 1/2, 1}, and there exists a minimum-weight 
vertex cover C* that has the following property. For each vertex v, if y v = 0, then v C* , while if 
y v = 1, then v G C*. 

The algorithm that achieves Hamming distance n/2 for unweighted Vertex Cover is as follows: 
Given the instance (a graph G = (V,E) and an integer k) compute the minimum-cost basic feasible 
solution y referred to in the theorem, then take the Hamming approximation C to be the set of 
vertices v such that y v > 0. 

Since the basic feasible solution y can be computed in polynomial time, the algorithm clearly 
runs in polynomial time. Next we argue that this C has Hamming distance at most n/2 from the 
minimum-size vertex cover C* referred to in the theorem. (Note that, as long as the instance (G, k) 
is feasible, this C* will be a witness.) 

Since there is a feasible solution of cost |C*| to the linear program, while y is a minimum-cost 
solution, it follows that |C*| > J^iVi- The choice of C implies J^iVi > \C\/2. Thus, |C*| > \C\/2. 
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The choice of C also implies that C* C C. This and |C*| > |C|/2 imply that C is within 
Hamming distance \C\/2 (which is at most n/2) from C*. 

This proves the correctness of the algorithm for Vertex Cover. 

The algorithm for Independent Set is as follows: Given the instance ( a graph G = (V, E) and 
an integer k ), run the above algorithm for Vertex Cover to compute a Hamming approximation C 
for the Vertex Cover instance {G,n — k) . Return the complement of C, i.e., C = V — C . 

To see why the algorithm is correct, recall that a vertex set S C V is an independent set in a 
graph G = (V, E) if and only if its complement S = V — S is a vertex cover of G. Thus, C has 
Hamming distance n/2 from some independent set of size k if and only if C has Hamming distance 
n/2 from some vertex cover of size n — k. 

The algorithm for Clique is as follows: Given the instance ( a graph G = (V, E) and an integer 
k ), run the above algorithm for Independent Set to compute a Hamming approximation C for the 
Independent Set instance (G,k), where G = (E,V) is the graph whose edge set is the complement 
of E. Return C. 

The algorithm is correct simply because a vertex set S is a clique in G if and only if S is an 
independent set in G. 

This completes the proof of Observation [10j 

4 Hardness of the universal language 

This section presents the main result: the hardness of achieving Hamming distance less than 
n/2 + \J en log n for U. First we introduce the utility functions H and P and state a relation 
between them. Define H(n,e) = \J en Inn. Define P(n, e) = (n 2€ Velnn) -1 . 

Observation 11 Fix any e G (0, 1/2). The probability that a random n-bit string has more than 
n/2 + H{n,e) ones is 0(P(n,e)). The probability that a random (n — l)-bit string has more than 
n/2 + H(n, e) ones is also 0(P(n, e)). 

This proof, which is standard, is in the Appendix. 

Here is the main result for deterministic algorithms: 

Theorem 2 Fix any e > 0. Suppose, fori! andTZjj, that there is a deterministic polynomial-time 
algorithm Au that achieves Hamming distance n/2 + H(n,e). Then P=NP. 

Proof: Assume that there exists e and Au as in the theorem. We will describe a polynomial-time 
algorithm for U. Since Li is NP-complete, the theorem follows. 

The algorithm, given a tuple (M, x, b, n), calls the subroutine shown in Fig. Q] with u = 2 n . 
If necessary, M is first modified so that the precondition holds. The algorithm returns an n-bit 
witness w such that (M,x,b,w) £ TZu (if such a witness exists, and "none" otherwise). 

The intuition is the following. The possible witnesses for machine M are the elements of [u]. 
The algorithm calls the oracle An to get some string v within small Hamming distance from some 
actual witness in [u] (if there is one). It then filters the set of possible witnesses to just those in [u] 
that are close enough in Hamming distance to v (this filtered set must include an actual witness, if 
there is one). It then computes the size v! of the filtered set, and constructs M' so that its possible 
witnesses are the elements of [u'\, and so that M'(x,w') accepts if M(x,4>(w')) accepts, where <p is 
a bijection between [u'] and the filtered set. Since v! < u, the algorithm eventually terminates. 
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Witness £ (M, x, b, u) — find w such that. (M, x, b, w) £ TZu 

preconditions: (1) If there is a witness w, then there is one such that w e [u], where [it] denotes the set of 
u binary strings s such that \s\ — \u\ — [log 2 u] and s < u (lexicographically). (2) M halts within \b\ 2 
steps. 

1. (base case) If u < \(M, x, b)\ c then do the following. For each w g [u]: simulate M(x, w) and return w 
if M(x,w) accepts. If no w causes M to accept, return "none". 

2. Let v = Au(M, x, n) and let J\f(v) denote the ra-bit binary strings within Hamming distance 
n/2 + H(n, e) from v, where n = \u\. 

3. Compute v! — \M(v) n [u]\. Let <f> : \u'\ — > (Af(v) fl [u]) be the bijection mapping the ith string in [u'\ to 
the ith string in M(v) n [u] (both sets ordered lexicographically). 

4. Construct Turing machine M' that does the following on input (x,w'): 

1. \f w' G [u'], then return M(x,(f>(w')) 

2. else reject. 

5. Let w' = WiTNESS e (ilf', x, b', u'), where b' is chosen so M' halts within \b'\ 2 steps. 

6. Return <fi(w'), or "none" if w 1 ="none". 



Figure 1: Assuming Au(M, x, n) computes a string v that achieves Hamming distance n/2 + H(n,e) to a 
witness, WiTNESS e (M, x, b, u) finds such a witness (if one exists). WlTNESS e excludes from the set of possible 
witnesses [u] those not in the Hamming-neighborhood oft;, then recurses on the filtered set. H(n,e) = y/en Inn. 

Correctness. Correctness follows from a straightforward inductive proof, based on the fact that 
(M, x, b) GU has a witness in [it] if and only if (M', x,b') €U has a witness in [u']. 

Running time. To prove that the algorithm terminates in polynomial time, we first prove that 
u' < u — Q(2 n P(n,e)). (Recall n = \u\ = [log 2 n].) This implies that the algorithm recurses 
0(1/P(n, e)) times before u decreases by a factor of 2, and n decreases by at least 1. This implies 
that the algorithm recurses 0(n/P(n,e)) times total. To finish, we then argue that each step can 
be implemented in polynomial time. 

Here are the details. Consider the set [2 n_1 ] C [u]. If we choose a random string r in [2 n_1 ], the 
probability that r has Hamming distance more than n/2 + H(n, e) to v is at least the probability 
that the last n — 1 bits of r have Hamming distance more than n/2 + H(n,e) to the last n — 1 
bits of v. This is the same as the probability that n — 1 random bits have at least n/2 + H(n,e) 
ones. By Observation II 11 this probability is Q(P(n,e)). Thus, the number of integers in [2 n_1 ] 
with Hamming distance more than n/2 + H(n, e) to v is at least 2 n ~ 1 ©(P(n, e)). Thus, the number 
of elements of [u] within Hamming distance n/2 + H(n, e) from v is at most u — 2 n_1 @(P(n, e)). 

To finish, we argue that each step can be implemented in polynomial time. There are two 
non-trivial issues: 

i. The set J\f(v) n [it] has exponential size, so we need to say how to implement the operations 
involving this set. In particular, the algorithm needs to be able to compute the size it' of the 
set and to compute the bijection <f> : u' — > M(v) n [it]. 

Let N(s) denote the number of n-bit binary strings having string s as a prefix and having 
Hamming distance at most k = [n/2 + H(n, e)J to v. The first |s| bits of such a string agree 
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with s; the remaining n — |s| bits of such a string differ in at most k — £ places from v, where 
£ is the Hamming distance between s and the first |s| bits of v. Thus, N(s) = X^=o ( n 7 )■ 
Thus, given s, N(s) can be computed in polynomial time. 

In step 3, the algorithm needs to compute u' = \J\f(v) n [u]\. This is J2tb e =i N(b±b2 ■ ■ ■ 6^_i0), 
where b\ ■ ■ ■ b n is the n-bit binary representation of u. So v! can be computed in polynomial 
time. 

In steps 4 and 6, the Turing machine and the algorithm need to be able to compute 4>{ w ') 
given w' . Given w' , it is easy to compute its rank i in [u']. Then 4>(w') is the largest n-bit 
string w such that \AT(v) C\[w]\ < i. Since \M{v) H [w}\ can be computed as described in the 
previous paragraph, w can be found in polynomial time using binary search. 

ii. We must check that the padding string b' constructed in each call to Witness,, (including the 
recursive calls) has size polynomial in the size of the original input (M,x,b,u). This follows 
from the observation that only polynomially many recursive calls are made, and with each 
one, \b'\ = \b\ 

□ 

Theorem 3 Suppose that, for IA and TZy, some randomized polynomial-time algorithm achieves 
Hamming distance n/2 + H(n, e) with probability 1 — 0(P(n, e)/n). Then RP=NP. 

Proof: Consider the algorithm Witness^ described in the proof of Theorem El with input 
(M, x, b, u). Witness^ calls the algorithm Au(M' , x, £) with I G { [clog nj , . . . , n}. For each value 
of £, Witness^ makes 0(1/P(£,e)) calls to An- 

Suppose Ay were randomized and had probability 0(P(£, e)/£) of failure on any input (M', x, £). 
Then the probability that none of the calls to Ay would fail would be at least 

m= c lozn{l-0{P{l,e)/l))°W p ^)) = exp(-0(EL c io g nl/^)) = Vn 0(1) . 

Thus, the algorithm WiTNESS e described in that proof would have probability 1 /n°^ of producing 
a witness in polynomial time. This shows that IA 6 RP. Since IA is NP-complete, the result follows. 

□ 

5 Discussion 

Is it likely that the hardness results for IA are tight? 

Among polynomial-time algorithms, the best randomized one that we know of is the trivial 
algorithm: guess n random bits. By Observation [TT] this algorithm achieves Hamming distance 
n/2 + H(n,e) with probability 0(P(n,e)). Thus, the hardness result for randomized algorithms 
for IA is essentially tight, at least for this particular trade-off of approximation parameter and the 
probability of success. 

The best deterministic polynomial-time algorithm we know of is a naive algorithm. It achieves 
Hamming distance n — c for any fixed c. The algorithm is as follows: Test each n-bit string with c 
or fewer 1's. If one is a witness, return it, otherwise return n . Note that 0™ is within Hamming 
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distance n — c of all untested strings, and therefore within Hamming distance n — c of any witness 
(if the algorithm returns n ). 

A simple adversary argument shows that this algorithm is essentially optimal among determin- 
istic algorithms that use the verifier M as a black box. (Such an algorithm, given an instance /, 
determines information about / only by querying the NP verifier M(I,x*) with various potential 
witnesses x* .) The adversary behaves as follows. Whenever the algorithm queries M(I,x*) with a 
given choice of x* , the adversary has M return "no". Suppose the algorithm runs in o(n c_1 ) time 
for some constant c, before it returns its answer x. There are at least ( n _™ +1 ) = f2(n c_1 ) strings 
whose Hamming distance to x is n — c + 1. At least one of these strings x* was not queried by the 
algorithm. The adversary can take x* to be the true witness (taking M(I,x*) = 1 only for this 
x*). This means the algorithm's answer, x, does not achieve Hamming distance n — c. In sum, any 
"black-box" deterministic algorithm that achieves Hamming distance n — c must take time f2(n c_1 ) 
in the worst case. 

Given the "universality" of U, it may be difficult to design an algorithm for U other than a 
black-box algorithm as described above. If so, by the simple argument above, it will be difficult to 
find a deterministic polynomial-time algorithm that improves on the naive upper bound of n—0(l). 

On the other hand, there is also a barrier to improving the corresponding lower bound (of n/2 + 
0[\fn log n) in the hardness result for deterministic algorithms for IX). The barrier is essentially 
that the proof technique would have to distinguish a deterministic algorithm from an algorithm 
that, although randomized, has only exponentially small likelihood of failure. More specifically, 
note that the proof in this paper is essentially a reduction from U to the problem of achieving 
Hamming distance n/2 + 0{\Jn logn) for U. The proof describes how, given an instance / of U, to 
reduce / to a sequence of instances of the easier problem. For any such reduction, it would work 
just as well to solve the sequence of easier instances with high probability. And, for any bound much 
above n/2 + 0(yjn logn), the naive randomized algorithm will do just that, and so, combined with 
the reduction, would give a randomized polynomial-time algorithm for 14, showing that RP=NP. 
Thus, it is unlikely that any proof that uses Au as a black box (that is, a proof by reduction) can 
establish a lower bound above n/2 + 0{yjn log n), even for deterministic algorithms. 

It might be interesting to explore this question in the context of non-uniform complexity. For ex- 
ample, it seems likely that standard derandomization methods might yield a deterministic P/POLY 
algorithm for achieving Hamming distance n/2 + 0{yjn logn) for U. 

Achieving Hamming distance n/2 for U is NP-hard. Achieving Hamming distance n/2 for 
Vertex Cover (and Clique and Independent Set) is in P. This raises a natural question: how hard 
is it to achieve Hamming distance n/2 for other natural problems? Is there a natural problem in 
NP (other than IX) for which it is NP-hard to achieve Hamming distance n/2? 
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Appendix 

Here is the proof of Observation [2j 

Proof: Let Ml be a Turing machine that decides the witness relation (x,w) G IZl in polynomial 
time Let 6 be a string of t(\x\) l's. 

By the definition of the universal language, the size-n witnesses for x G L are exactly the size-re 
witnesses for (Ml,x, b) G U. 

Given (x, re), the algorithm Ai does the following: run An on {Mi, x,p, n) and return the result. 
This takes polynomial time because Ay takes polynomial time and \p\ < t(\x\). 

By the definition of the universal language, the size-n witnesses for x G L are exactly the size-n 
witnesses for (Mi,x,b) G U. Thus, whatever string Ay returns, the string is a size-n witness for 
(Mi,x,p) G IA if and only if it is a size-n witness for x G L. 

Thus, if x G L has a size-n witness, then Al(x) is within Hamming distance H(\w\) to some 
such witness with probability at least P(|w|). □ 

Here is the proof of Observation [Til 
Proof: Assume for simplicity of notation that re is even. (The case when re is odd is similar.) Let 
H = \H(n, e)]. Let pi = ( n /2+i) ■ The probability that re random bits have at least H ones is 

Yh=hPi- Note tliat 

Pi+i 1 - 2i/n 
Pi l + 2i/n' 

Prom this it follows that the sum YHi=nPi ls proportional to the sum of its first Q(n/H) or so terms, 
and that each of the first Q(n/H) terms is proportional to the first term. Thus, the entire sum is 
Q((n/H)pH). A calculation shows that pu = 0(n _2e_1 / 2 ). Thus, the entire sum is 0(n~ 2<E+1 / 2 / H). 
Plugging in the definition of H gives the first claim. 

Till 

The second claim follows easily because the probability in question is at least J2i=H+2P^ which 
by the above considerations is also 0((n/ 1 H)ph). 

For the thorough reader, here are the intermediate steps in the calculation that pn = ©(n -2e_1 / 2 ). 
We use Stirling's approximation i\ = Q((i/e) 1 /\/i), then we use (1 + a) b = exp(61n(l + a)) = 
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on 

, n \ = 6 ((1 - 2H/n) n/2 - H (1 + 2H/n) n/2+H ) 



exp(b(a + 0(a?))) = G(exp(a6)) when a 2 b = 0(1) and \a\ < 1. 

T 

X :l/2 + H) 

= e{exp(AH 2 /n 

= e(n 2e ). 

□ 
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